AITopics | bayesian inverse reinforcement learning

Collaborating Authors

bayesian inverse reinforcement learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Approximated Variational Bayesian Inverse Reinforcement Learning for Large Language Model Alignment

Cai, Yuang, Yuan, Yuyu, Shi, Jinsheng, Lin, Qinhong

arXiv.org Artificial IntelligenceNov-14-2024

The alignment of large language models (LLMs) is crucial for generating helpful and harmless content. Existing approaches leverage preference-based human feedback data to learn the reward function and align the LLM with the feedback data. However, these approaches focus on modeling the reward difference between the chosen and rejected demonstrations, rather than directly modeling the true reward from each demonstration. Moreover, these approaches assume that the reward is only obtained at the end of the sentence, which overlooks the modeling of intermediate rewards. These issues lead to insufficient use of training signals in the feedback data, limiting the representation and generalization ability of the reward and potentially resulting in reward hacking. In this paper, we formulate LLM alignment as a Bayesian Inverse Reinforcement Learning (BIRL) problem and propose a novel training objective, Approximated Variational Alignment (AVA), to perform LLM alignment through Approximated Variational Reward Imitation Learning (AVRIL). The BIRL formulation facilitates intermediate reward modeling and direct reward modeling on each single demonstration, which enhances the utilization of training signals in the feedback data. Experiments show that AVA outperforms existing LLM alignment approaches in reward modeling, RL fine-tuning, and direct optimization.

objective, reward modeling, training objective, (12 more...)

arXiv.org Artificial Intelligence

2411.09341

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Walking the Values in Bayesian Inverse Reinforcement Learning

Bajgar, Ondrej, Abate, Alessandro, Gatsis, Konstantinos, Osborne, Michael A.

arXiv.org Artificial IntelligenceJul-15-2024

The goal of Bayesian inverse reinforcement learning (IRL) is recovering a posterior distribution over reward functions using a set of demonstrations from an expert optimizing for a reward unknown to the learner. The resulting posterior over rewards can then be used to synthesize an apprentice policy that performs well on the same or a similar task. A key challenge in Bayesian IRL is bridging the computational gap between the hypothesis space of possible rewards and the likelihood, often defined in terms of Q values: vanilla Bayesian IRL needs to solve the costly forward planning problem - going from rewards to the Q values - at every step of the algorithm, which may need to be done thousands of times. We propose to solve this by a simple change: instead of focusing on primarily sampling in the space of rewards, we can focus on primarily working in the space of Q-values, since the computation required to go from Q-values to reward is radically cheaper. Furthermore, this reversion of the computation makes it easy to compute the gradient allowing efficient sampling using Hamiltonian Monte Carlo. We propose ValueWalk - a new Markov chain Monte Carlo method based on this insight - and illustrate its advantages on several tasks.

algorithm, avril, posterior, (17 more...)

arXiv.org Artificial Intelligence

2407.10971

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Add feedback

Bayesian Inverse Reinforcement Learning for Non-Markovian Rewards

Topper, Noah, Velasquez, Alvaro, Atia, George

arXiv.org Artificial IntelligenceJun-20-2024

Inverse reinforcement learning (IRL) is the problem of inferring a reward function from expert behavior. There are several approaches to IRL, but most are designed to learn a Markovian reward. However, a reward function might be non-Markovian, depending on more than just the current state, such as a reward machine (RM). Although there has been recent work on inferring RMs, it assumes access to the reward signal, absent in IRL. We propose a Bayesian IRL (BIRL) framework for inferring RMs directly from expert behavior, requiring significant changes to the standard framework. We define a new reward space, adapt the expert demonstration to include history, show how to compute the reward posterior, and propose a novel modification to simulated annealing to maximize this posterior. We demonstrate that our method performs well when optimizing according to its inferred reward and compares favorably to an existing method that learns exclusively binary non-Markovian rewards.

inverse reinforcement learning, reinforcement learning, reward function, (16 more...)

arXiv.org Artificial Intelligence

2406.13991

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Colorado > Boulder County > Boulder (0.04)
North America > United States > Florida > Hillsborough County > University (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Nonparametric Bayesian Inverse Reinforcement Learning for Multiple Reward Functions

Neural Information Processing SystemsMar-14-2024, 03:10:55 GMT

We present a nonparametric Bayesian approach to inverse reinforcement learning (IRL) for multiple reward functions. Most previous IRL algorithms assume that the behaviour data is obtained from an agent who is optimizing a single reward function, but this assumption is hard to guarantee in practice. Our approach is based on integrating the Dirichlet process mixture model into Bayesian IRL. We provide an efficient Metropolis-Hastings sampling algorithm utilizing the gradient of the posterior to estimate the underlying reward functions, and demonstrate that our approach outperforms previous ones via experiments on a number of problem domains.

behaviour data, reward function, trajectory, (14 more...)

Neural Information Processing Systems

Country: Asia > South Korea > Daejeon > Daejeon (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)

Add feedback

Massively Scalable Inverse Reinforcement Learning in Google Maps

Barnes, Matt, Abueg, Matthew, Lange, Oliver F., Deeds, Matt, Trader, Jason, Molitor, Denali, Wulfmeier, Markus, O'Banion, Shawn

arXiv.org Artificial IntelligenceSep-10-2023

Optimizing for humans' latent preferences remains a grand challenge in route recommendation. Prior research has provided increasingly general techniques based on inverse reinforcement learning (IRL), yet no approach has been successfully scaled to world-sized routing problems with hundreds of millions of states and demonstration trajectories. In this paper, we provide methods for scaling IRL using graph compression, spatial parallelization, and problem initialization based on dominant eigenvectors. We revisit classic algorithms and study them in a large-scale setting, and make the key observation that there exists a trade-off between the use of cheap, deterministic planners and expensive yet robust stochastic policies. We leverage this insight in Receding Horizon Inverse Planning (RHIP), a new generalization of classic IRL algorithms that provides fine-grained control over performance trade-offs via its planning horizon. Our contributions culminate in a policy that achieves a 16-24% improvement in global route quality, and to the best of our knowledge, represents the largest instance of IRL in a real-world setting to date. Benchmark results show critical benefits to more sustainable modes of transportation, where factors beyond journey time play a substantial role. We conclude by conducting an ablation study of key components, presenting negative results from alternative eigenvalue solvers, and identifying opportunities to further improve scalability via IRL-specific batching strategies.

inverse reinforcement learning, learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2305.1129

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.04)
Asia > Philippines > Luzon > National Capital Region > City of Manila (0.04)
(17 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games > Computer Games (0.67)
Transportation > Ground > Road (0.46)
Information Technology > Services (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

MAP Inference for Bayesian Inverse Reinforcement Learning

Neural Information Processing SystemsApr-6-2023, 12:52:50 GMT

The difficulty in inverse reinforcement learning (IRL) arises in choosing the best reward function since there are typically an infinite number of reward functions that yield the given behaviour data as optimal. Using a Bayesian framework, we address this challenge by using the maximum a posteriori (MAP) estimation for the reward function, and show that most of the previous IRL algorithms can be modeled into our framework. We also present a gradient method for the MAP estimation based on the (sub)differentiability of the posterior distribution. We show the effectiveness of our approach by comparing the performance of the proposed method to those of the previous algorithms.

bayesian inverse reinforcement learning, map inference, reward function, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

MAP Inference for Bayesian Inverse Reinforcement Learning

Choi, Jaedeug, Kim, Kee-eung

Neural Information Processing SystemsFeb-14-2020, 23:27:19 GMT

bayesian inverse reinforcement learning, map inference, reward function, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Nonparametric Bayesian Inverse Reinforcement Learning for Multiple Reward Functions

Choi, Jaedeug, Kim, Kee-eung

Neural Information Processing SystemsDec-31-2012

We present a nonparametric Bayesian approach to inverse reinforcement learning (IRL) for multiple reward functions. Most previous IRL algorithms assume that the behaviour data is obtained from an agent who is optimizing a single reward function, but this assumption is hard to be met in practice. Our approach is based on integrating the Dirichlet process mixture model into Bayesian IRL. We provide an efficient Metropolis-Hastings sampling algorithm utilizing the gradient of the posterior to estimate the underlying reward functions, and demonstrate that our approach outperforms the previous ones via experiments on a number of problem domains.

Add feedback